Reproducible data science environments with {rix}

Bruno Rodrigues

Intro: Who am I

Bruno Rodrigues, head of the statistics department at the Ministry of Research and Higher Education in Luxembourg

Intro: Luxembourg?

Intro: Luxembourg?

Intro: where to find the code

Slides available online:

https://b-rodrigues.github.io/repro_oak_ridge

Code available here:

https://github.com/b-rodrigues/repro_oak_ridge

What I’ll be talking about

The puzzle you know:

What I’ll be talking about

The puzzle with Nix:

Available solutions for R

  • {renv} or {groundhog}: simple to use, but:
    • Doesn’t save the R version
    • Installing old packages may fail (system dependencies)
  • Docker goes further:
    • Manages R and system dependencies
    • Containers executable anywhere
  • But:
    • Not inherently reproducible

The Nix package manager (1/2)

Package manager: tool for installing and managing packages

Package: any software (not just R packages)

A popular package manager:

Google Play Store

The Nix package manager (2/2)

  • To ensure reproducibility: R, R packages, and other dependencies must be explicitly managed
  • Nix is a package manager truly focused on reproducible builds
  • Nix manages everything using a single text file (called a Nix expression)!
  • These expressions always produce exactly the same result

rix: reproducible development environments with Nix (1/5)

  • {rix} (website) simplifies writing Nix expressions!
  • Just use the provided rix() function:
library(rix)

rix(date = "2025-06-13",
    r_pkgs = c("dplyr", "ggplot2"),
    system_pkgs = NULL,
    git_pkgs = NULL,
    tex_pkgs = NULL,
    ide = "code",
    project_path = ".")

rix: reproducible development environments with Nix (2/5)

  • renv.lock files can also serve as a starting point:
library(rix)

renv2nix(
  renv_lock_path = "path/to/original/renv_project/renv.lock",
  project_path = "path/to/rix_project",
  override_r_ver = "4.4.1" # <- optional
)

rix: reproducible development environments with Nix (3/5)

  • List the R version and required packages
  • Optionally:
    • system packages, GitHub packages, or LaTeX packages
    • an IDE (RStudio, Radian, VS Code, or “other”)
    • a version of Python and Python packages to include
    • a version of Julia and Julia packages to include

rix: reproducible development environments with Nix (4/5)

  • rix::rix() generates a default.nix file
  • Build the expressions with nix-build (in terminal) or rix::nix_build() from R
  • Access the development environment with nix-shell
  • Expressions can be generated even without Nix installed (with some limitations)

rix: reproducible development environments with Nix (5/5)

  • Can install specific versions of packages (write "dplyr@1.0.0")
  • Can install packages hosted on GitHub
  • Many vignettes to get started! See here

Polyglot environments with rix

  • rix() supports Python and Julia alongside R:
rix(date = "2025-06-09",
    r_pkgs = c("tidyr", "dplyr", "ggplot2", "languageserver"),
    py_conf = list(
      py_version = "3.13", 
      py_pkgs = c("polars", "scikit-learn")
    ),
    ide = "none",
    project_path = ".",
    overwrite = TRUE)
  • Julia: use jl_conf = list(jl_version = "1.10", jl_pkgs = c(...))
  • See: scripts/nix_expressions/02_native_vscode_example/

Demo

  • Basics: scripts/nix_expressions/01_rix_intro/
  • Native VS Code/Positron on Windows: scripts/nix_expressions/02_native_vscode_example/
  • Nix and {targets}: scripts/nix_expressions/03_nix_targets_pipeline
  • Nix and Docker: scripts/nix_expressions/04_docker/
  • Nix and {shiny}: scripts/nix_expressions/05_shiny
  • GitHub Actions: see here

Polyglot pipelines with {rixpress}

  • {rixpress} allows chaining processing steps in R and Python
  • Uses {rix} to create a reproducible (via Nix) execution environment for the pipeline
  • Each pipeline step is a Nix derivation
  • Data transfer: automatic via reticulate or universal format (JSON)

An example of a polyglot pipeline

list(
  rxp_py_file(…),    # Read a CSV with Python
  rxp_py(…),         # Filter with Polars
  rxp_py2r(…),       # Python → R transfer
  rxp_r(…),          # Transformation in R
  rxp_r2py(…),       # R → Python transfer
  rxp_py(…),         # Another Python step
  rxp_py2r(…),       # Back to R
  rxp_r(…)           # Final step
) |> rixpress()
  • Each step is named and typed (py, r, r2py, etc.)
  • Ability to add files (functions.R, images…)

To learn more about rixpress:

Fin

Contact me if you have questions:

Thanks!